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LONG-TERM  GOALS 

The  long-term  goal  of  our  project  is  to  develop  and  enhance  research  and  educational 
capabilities  in  the  area  of  coastal  engineering  and  science  at  Louisiana  State  University 
(LSU)  while  simultaneously  supporting  the  Navy’s  research  goals  in  the  area  of  Coastal 
Geosciences.  The  focus  of  the  present  work  is  to  develop  a  new  modeling  framework  for 
simulations  of  coastal  processes  in  deltaic  environments  using  advanced  numerical 
methods  and  high  performance  computing  technology.  In  particular,  the  utilization  of 
adaptive  numerical  methods  such  as  the  spectral  element  method  on  modern  computer 
platforms  with  thousands  of  multi-core  processors  will  enable  coastal  modelers  to 
simulate  complex  physical  processes  with  improved  accuracy  and  efficiency. 
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OBJECTIVES 


The  specific  objectives  of  this  project  are  to: 

•  Develop  the  capability  of  modeling  coastal  circulation  and  nearshore  surface 
waves  in  deltaic  sedimentary  and  hydrodynamic  environments  in  an  integrated 
modeling  framework  by  extending  the  Boussinesq  theory  for  nearshore 
hydrodynamics  to  muddy  coasts  and  non-hydrostatic  three-dimensional  (3D)  flow 
regimes  with  stratifications. 

•  Complement  the  Office  of  Naval  Research  recent  research  initiatives  on  Tidal 
Flats  and  Wave-Mud  Interactions  by  integrating  the  new  modeling  system  with 
the  field  data  collected  in  those  programs. 

•  Simulate  large-scale,  long-term  problems  in  the  deltaic  environment  by 
integrating  the  application-oriented  modeling  system  with  massive-processor 
computing  facilities  and  technologies  available  at  LSU  and  in  Louisiana. 

•  Quantify  the  generation,  transport  and  dissipation  of  potential  vorticity  in  the  surf 
and  swash  zones,  as  well  as  the  momentum  exchange  between  the  two  dynamical 
regions. 

APPROACH 

The  research  project  consists  of  theoretical  formulation  and  analysis,  the  development 
and  verification  of  an  advanced  modeling  system  using  the  spectral/hp  element  methods 
and  high-performance  computing  technologies,  and  the  utilization  of  the  new  model  as  a 
research  tool  to  advance  knowledge  and  understanding  of  coastal  circulation  and 
nearshore  waves  in  deltaic  sedimentary  and  hydrodynamic  environments.  Relevant 
methodologies  in  three  disciplines:  civil  engineering,  physical  oceanography  and 
computational  science,  are  being  utilized.  Interdisciplinary  interactions  are  taking  place 
among  the  investigators  in  different  fields  and  through  recruiting  graduate  students  from 
the  three  disciplines.  All  project  members  meet  together  weekly,  with  meetings 
alternating  between  focusing  on  coastal  science  and  computational  science.  Code 
development  and  document  preparation  is  enabled  through  a  project- wide  source  code 
versioning  system. 

A  new  approach  is  taken  to  meet  the  objectives  of  this  project:  1)  Use  the  Boussinesq 
theory  to  improve  the  efficiency  of  non-hydrostatic  3D  Navier-Stokes  equation  solvers  as 
well  as  to  extend  the  applicability  of  the  modeling  system  to  deltaic  environments,  and  2) 
utilize  spectral/hp  element  methods  with  unstructured  grids  to  solve  the  partial 
differential  equations  (PDE)  under  realistic  deltaic  conditions  on  high-performance, 
massive-processor  computers  available  at  LSU. 

The  theoretical  derivation  follows  closely  the  approach  of  Dalrymple  and  Liu  (1978)  for 
the  treatment  of  soft  mud,  and  the  procedure  of  Chen  et  al.  (2003)  and  Chen  (2006)  for 


the  treatment  of  surface  waves.  The  Boussinesq  approach  is  not  just  limited  to  the 
modeling  of  nonlinear  surface  waves  and  breaking-generated  currents  over  porous  or 
muddy  seabed.  An  efficient  hydrodynamic  model  for  density-stratified  flow  with  a  free- 
surface  in  the  weakly  non-hydrostatic  regime  has  been  developed  by  Shen  (2001)  and 
Shen  and  Evans  (2004).  This  formulation  allows  for  applying  the  weakly  non-hydrostatic 
approximation,  similar  to  the  Boussinesq  approach  to  nonlinear  surface  gravity  waves,  to 
strongly  nonlinear  internal  waves  in  the  coastal  ocean  where  the  horizontal  scale  of  the 
density-stratified  wave/current  motion  exceeds  the  local  water  depth.  The  approximation 
eliminates  the  vertical  dimension  of  the  elliptic  equation  that  is  normally  required  for  the 
fully  non-hydrostatic  modeling,  and  as  a  result  the  model’s  computation  efficiency  is 
greatly  increased  by  a  factor  proportional  to  the  number  of  grid  points  used  for  vertical 
resolution. 

The  shallow  water  equations  (SWE)  are  a  set  of  non-linear  hyperbolic  equations.  As  the 
equations  are  derived  under  the  assumption  of  hydrostatic  pressure,  the  SWE  are  only 
valid  for  long  waves.  The  source  terms  can  contain  forcing  due  to  e.g.  bathymetry, 
bottom  friction,  atmospheric  pressure,  Coriolis  force,  wind  stresses  and  diffusion.  As  the 
first  step  of  model  developments,  we  have  focused  on  the  homogenous  version  of  the 
SWE  to  assess  the  performance  of  the  computational  core.  Boussinesq-type  models  for 
wave-driven  currents  are  computationally  demanding.  Taking  seabed  conditions  into 
account  by  the  new  Boussinesq  model  will  further  increase  the  computational  effort  by  a 
factor  of  two.  It  is  therefore  desired  to  speed  up  Boussinesq  models  for  practical 
applications,  in  particular  for  morphological  and  ecological  simulations.  The  solution  to 
the  growing  demand  of  computing  power  by  Boussinesq  coastal  models  is  the  use  of 
high-performance  computing  (HPC)  technologies. 

WORK  COMPLETED 

Figure  1  illustrates  the  implementation  strategy  for  the  discontinuous  Galerkin  scheme 
using  the  open-source  spectral//;/;  library  Nektar++  (www.nektar.info).  Through 
Nektar++,  the  fundamental  routines  associated  with  a  high-order  finite-element  method 
are  easily  accessible.  With  regard  to  the  high-order  discretizations,  our  work  has  been 
directed  towards  implementing  a  solver  specific  to  time-dependent  problems.  This 
includes  a  SWE  class  containing  functions  for  the  evaluation  of  the  flux  vectors, 
numerical  fluxes,  equation  dependent  boundary  conditions,  various  source  terms,  etc. 
This  class  provides  a  SWE  solver  library. 

Our  approach  in  parallelizing  the  SWE  solver  was  to  integrate  our  code  into  the  Cactus 
computational  framework  and  provide  parallelism  through  a  Cactus  “Driver”  module  (or 
“thorn”)  (see  Fig.  1).  Cactus  (www.cactuscode.org)  is  an  open  source  problem  solving 
environment  designed  for  scientists  and  engineers.  Its  modular  structure  easily  enables 
parallel  computation  across  different  architectures  and  collaborative  code  development 
between  different  groups.  Cactus  originated  in  the  academic  research  community,  where 
it  was  developed  and  used  over  many  years  by  a  large  international  collaboration  of 
physicists  and  computational  scientists. 


In  order  to  integrate  the  serial  SWE  solver  into  Cactus,  several  thorns  have  been 
developed.  Most  importantly,  we  have  designed  a  “Nektar++”  thorn  that  initializes  and 
populates  the  data  structures  of  Nektar++.  We  also  provide  a  “SWE”  thorn  that  contains 
the  actual  SWE  solver  based  on  routines  defined  in  the  SWE  solver  library. 

Integrating  the  SWE  solver  in  Cactus  provides  easy  access  to  a  maintainable  parallel 
layer  that  has  been  developed  to  support  unstructured  meshes.  The  parallel  layer  is  a 
special  module  or  “thorn”  in  Cactus  referred  to  as  the  unstructured  mesh  driver 
(UMDriver).  This  separation  of  programming  tasks  enables  coastal  engineers  to  focus  on 
developing  coastal  modeling  code  using  Nektar++  and  the  computational  scientists  to 
focus  on  parallelism  and  performance  of  the  unstructured  mesh  driver. 

The  unstructured  mesh  driver  utilizes  the  Zoltan  (www.cs.sandia.gov/Zoltan/)  library  to 
perform  part  of  its  parallel  operations  such  as  mesh  partitioning  and  load  balancing.  The 
Zoltan  library  contains  a  number  of  tools  that  simplify  the  development  and  improve  the 
performance  of  parallel,  unstructured  and  adaptive  applications.  The  library  is  organized 
as  a  toolkit,  so  that  application  developers  can  use  as  little  or  as  much  of  the  library  as 
desired.  As  the  driver  layer  matures,  it  will  be  less  dependent  on  external  packages  and 
will  be  performing  many  other  tasks  that  are  being  added,  such  as  support  for  adaptive 
mesh  refinement,  a  hyper-slabbing  interface  and  dynamic  load  balancing. 


Figure  1:  Schematic  diagram  for  the  Cactus  and  Nektar++  interface. 


RESULTS 

The  major  results  obtained  so  far  are:  1)  the  implementation  of  a  new  SWE  solver  based 
on  the  discontinuous  Galerkin  spectral/hp  element  method  in  Nektar++  interfacing  with 
the  Cactus  Framework  to  handle  the  parallel  computing  issues,  2)  the  scaling  tests  on 
unstructured  meshes  for  the  coupled  software,  3)  the  simulations  of  wave  interaction  with 
five  upright  cylinders  as  a  testing  case,  and  4)  the  implementation  of  the  Boussinesq 


model  (FUNWAVE)  for  waves  and  currents  on  a  solid  bed  into  the  Cactus  Framework 
using  finite-difference  schemes  to  serve  as  a  verification  tool. 

To  demonstrate  a  few  capabilities  and  features  of  the  coupled  Cactus-Nektar++  software 
implementation  of  SWE,  some  preliminary  results  of  numerical  convergence  test  are 
presented  in  Fig.  2.  Consider  the  simple  case  of  a  linear  standing  wave  with  a  wavelength 
of  10  m  in  a  square  10  m  by  10  m  basin.  The  still  water  depth  is  0.5  m.  In  order  to 
compare  with  the  analytical  solution  here,  we  use  the  linearized  SWE.  The  solution  for 
one  wave  period  was  obtained  using  numerical  integration  on  10,000  time  steps.  Figure  2 
shows  the  error  and  order  of  convergence  measured  in  the  Lt  nonn. 

Exponential  Convergence  Results :  Spectral/hp  element  methods  provide  dual  paths  to 
convergence:  /^-refinement  and  /?-refinement.  Here  />-refinment  refers  to  the  increase  in 
polynomial  order  of  the  basis  functions  for  the  elements,  while  ^-refinement  refers  to  the 
decreasing  mesh  size  or  increasing  element  numbers  and  nodes.  The  key  feature  of 
spectral/hp  elements  is  that  /^-refinement  gives  rise  to  exponentially  fast  convergence,  as 
illustrated  in  Fig.  2  for  the  standing  wave  case.  It  is  seen  that  the  numerical  errors 
decrease  exponentially  before  reaching  the  plateau,  as  the  order  of  the  basis  functions 
increases. 

In  order  to  assess  the  parallel  performance  of  the  developed  software  on  multi-processor 
systems,  we  need  to  carry  out  two  types  of  scaling  tests:  Weak  scaling  test  and  strong 
scaling  test.  In  weak  scaling  tests,  the  computational  work  per  processor  (or  core)  is 
maintained  constant  and  the  total  size  of  the  problem  increases  with  increase  in  number 
of  processors.  For  an  ideal  weak  scaling,  the  time  to  the  solution  should  not  increase 
significantly  when  the  problem  size  is  increased  in  proportion  to  the  number  of 
processors.  On  the  other  hand,  strong  scaling  test  uses  a  fixed  problem  size  and 
determines  the  time  to  solution  while  increasing  the  number  of  processors.  For  an  ideal 
strong  scaling,  the  time  to  solution  should  continue  to  decrease  for  a  fixed  computational 
work  with  increasing  number  of  processors.  All  these  tests  were  performed  on  the 
“QueenBee”  supercomputer  (www.loni.org)  that  has  680  nodes  with  each  node  having  2 
quad-core  processor  (i.e.  8  cores)  with  8  Gb  RAM. 


Figure  2:  Illustration  of  exponential  convergence 


Weak  Scaling  Results’.  Tests  have  been  carried  out  for  two  types  of  meshes:  consisting  of 
100  and  900  quadrilaterals  per  core,  respectively.  Both  sets  are  run  with  three  different 
polynomial  orders:  p  =  4,  6  and  8.  Thus,  the  largest  run  contains  218,700  unique  degrees 
of  freedom  per  core,  or  a  total  of  roughly  28  million  degrees  of  freedom.  Total  execution 
time  as  well  as  solution  time  for  the  model  is  recorded  for  one  hundred  time  steps.  Figure 
3  (top)  shows  the  solution  times  (without  I/O,  initializing  and  partitioning  of  the  mesh) 
for  900  quadrilaterals  per  core  mesh.  The  results  from  the  100-quaddrilateral-elements- 
per-core  mesh  were  as  expected  and  hence,  are  not  shown  here  for  the  sake  of  brevity. 

We  notice  that  the  total  wall  clock  time  to  completion  for  all  polynomial  orders  increases 
as  we  increase  the  number  of  processors  and  correspondingly  the  size  of  the  modeling 
domain  (i.e.  a  weak-  scaling  test).  The  parallelization  efficiencies  range  from  95%  to 
78%,  as  the  number  of  cores  increase  from  2  to  128.  Additional  efforts  will  be  made  to 
further  improve  the  scalability  and  performance  by  redesigning  the  problem  setup 
routines  and  mesh  partitioning  routines  in  the  next  round  of  code  refinement  and 
optimization. 
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Figure3:  Parallel  performance:  Weak  Scaling  (top)  and  Strong  Scaling  (bottom). 


Strong  Scaling  Results :  The  final  computational  load  for  128  processors  is  about  100 
elements  per  core  and  for  the  single  node  (or  8  cores)  it  translates  to  1600  elements  per 
node.  Figure  3  (bottom)  shows  the  reduction  in  solution  time  for  a  fixed  size  problem  as 
the  number  of  processors  increased.  Various  tests  have  been  conducted  by  changing  the 
polynomial  order  while  keeping  the  same  underlying  geometric  mesh.  Although  there  is  a 
slight  reduction  in  parallelization  efficiency  between  64  to  128  cores,  these  results  are 
highly  encouraging  from  the  perspective  that  now  coastal  scientists  can  easily  run  their 


existing  single-processor  simulations  on  several  hundreds  of  processors  to  reduce  the 
total  computational  time  and  solve  practical  problems  in  an  efficient  fashion. 


Figure  4:  Unstructured  triangular  element  mesh  around  five  cylinders  showing  the 
details  of  collocation  points  in  the  interior  of  the  elements  for  sixth  order  basis  functions. 
Different  colors  illustrate  the  domain  decomposition  with  each  color  representing  the 
partitioned  mesh  belonging  to  each  processor. 


Wave  Interaction  with  Five  Upright  Cylinders’.  As  a  numerical  example  to  illustrate  the 
sanity  of  flow  physics  that  is  being  simulated,  wave  propagation  and  interaction  with  five 
upright  cylinders  are  simulated.  This  example  has  enough  complexity  to  be  relevant  and 
yet  simple  enough  to  use  it  for  debugging  purposes.  Figure  4  shows  the  domain 
decomposition  of  an  unstructured  mesh  as  well  as  the  details  of  collocation  points  inside 
the  spectral  elements  (inset).  Unstructured  meshes  can  easily  adapt  to  complicated 
features:  geometric  and/or  flow  using  the  h/p  refinement  ability  of  the  spectral  element 
method.  As  a  sample  result,  computed  free  surface  elevations  are  presented  at  two 
different  time  instants  in  Figure  5.  It  is  seen  that  wave  runup  on  the  cylinders  as  well  as 
wave  scattering  and  diffraction  by  the  group  of  cylinders  are  well  reproduced  by  the 
parallel  code. 


Figure  5:  Modeled  free  surface  interacting  with  five  cylinders.  Left:  t=9.9sec,  right: 

t=20sec. 

IMPACT/APPLICATIONS 

We  have  successfully  implemented  a  spectral//?/?  discontinuous  Galerkin  method  for 
solving  the  SWE  and  interfaced  it  with  the  Cactus  computational  framework.  We  have 
obtained  encouraging  results  in  terms  of  parallel  performance  and  the  coupled  software’s 
scaling  ability.  There  are  still  some  unresolved  issues  in  terms  of  reducing  the  overheads 
associated  with  ghost  elements  for  communication  between  sub-domains.  The  use  of 
Cactus  provides  a  path  for  extensibility,  integrating  with  cutting  edge  computational 
hardware  and  cyberinfrastructure,  and  building  a  comprehensive  toolkit  for  coastal 
applications.  We  have  also  demonstrated  the  relevance  of  simulated  flow  physics  to  the 
coastal  modeling  community. 

The  research  is  expected  to  improve  the  Navy’s  capability  of  modeling  nearshore  surface 
waves  and  coastal  processes  in  heterogeneous  sedimentary  environments.  First,  the  study 
will  extend  the  applicability  of  Boussinesq  models  (Chen  et  al.  1999  and  Chen  et  al. 
2003)  to  the  porous  and  soft  mud  seabed.  This  will  provide  sediment  transport  models 
with  more  realistic  estimates  of  cross-shore  and  alongshore  velocities  in  coastal  regions 
with  substantial  variation  in  seabed  properties.  Therefore,  improvements  in  predicting 
littoral  sediment  transport  will  be  anticipated.  A  better  prediction  of  turbidity  in  coastal 
regions  is  of  importance  to  naval  deployments  of  unmanned  underwater  vehicles  (UUV) 
and  divers  for  inshore  countermine  warfare.  The  modeling  framework  integrated  with  the 
CFD  Toolkits  developed  at  LSU  will  allow  us  to  couple  the  hydrodynamic  models  with 
sediment  transport  models  for  coastal  morphodynamic  studies. 

In  addition  to  supporting  the  Navy’s  research  goals,  the  proposed  project  will  lead  to 
contributions  to  the  Louisiana  State  University’s  mission  on  research  and  graduate 
education.  A  survey  conducted  by  the  National  Research  Council  has  shown  that  the 
north  Gulf  Coast,  where  the  Naval  Research  Laboratory  and  other  naval  facilities  are 
located,  is  in  need  of  research  and  education  in  coastal  engineering.  Thus,  the  proposed 
training  of  the  post-doctoral  fellow  and  graduate  students  will  enhance  the  graduate 
grogram  in  coastal  engineering  at  LSU  to  meet  the  need  for  graduate  education  on  the 
Gulf  Coast  in  support  of  national  defense. 


TRANSITIONS 


None 

RELATED  PROJECTS 

Our  project  is  leveraging  and  coordinating  with  activities  in  several  other  ongoing 
activities: 

XiRel:  This  NSF  funded  project  is  optimizing  and  extending  an  Adaptive  Mesh 
Refinement  layer  for  the  Cactus  framework,  which  will  be  used  for  our  structured  grid 
codes.  (http://www.cactuscode.org/Development/xirel) 

ALPACA:  This  NSF  funded  project  is  developing  debugging  and  profiling  tools  for  the 
Cactus  framework  which  will  support  the  Coastal  Modeling  Framework  developed  in  this 
project.  (http://www.cactuscode.org/Development/alpaca) 

CyberTools:  This  NSF/BOR  funded  project  is  developing  a  cyberinfrastructure  across  the 
100  TFlop  machines  of  the  Louisiana  Optical  Network  Initiative.  Our  project  is  providing 
one  of  the  application  drivers  for  this  infrastructure,  (http://cvbertools.loni.org) 

CFD  IGERT:  An  NSF  graduate  training  and  education  program  at  LSU  in  training 
students  in  computational  fluid  dynamics  and  high  performance  computing.  Several 
research  projects  are  building  on  the  CFD  Toolkit  which  is  contributing  to  our  project. 

SCOOP:  Where  appropriate,  our  models  will  be  integrated  into  the  community 
infrastructure  of  the  NOAA/ONR  funded  SURA  Coastal  Ocean  and  Observing  Program. 
SCOOP  maintains  a  coastal  archive  at  LSU  with  realtime  forcing  and  simulation  data  for 
storm  events. 

NSF-CAREER:  The  five-year  research  project  is  focused  on  simulations  of  nonlinear 
coastal  waves  and  air-sea  momentum  fluxes,  which  complements  the  present  research 
project,  (http://www.nsf.gov/eng/cbet/nuggets/1443/1443_chen.htm) 

The  Office  of  Naval  Research  new  research  initiatives  of  Tidal  Flats  and  Wave-Mud 
Interactions  are  closely  related  to  the  present  study.  These  provide  the  project  with  an 
excellent  opportunity  to  combine  the  modeling  efforts  with  the  field  studies. 
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